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Abstract 

We study the compressed representation of a ranked tree by a (string) straight-line pro¬ 
gram (SLP) for its preorder traversal, and compare it with the well-studied representation by 
straight-line context free tree grammars (which are also known as tree straight-line programs 
or TSLPs). Although SLPs turn out to be exponentially more succinct than TSLPs, we show 
that many simple tree queries can still be performed efficiently on SLPs, such as computing 
the height and Horton-Strahler number of a tree, tree navigation, or evaluation of Boolean ex¬ 
pressions. Other problems on tree traversals turn out to be intractable, e.g. pattern matching 
and evaluation of tree automata. 


1 Introduction 

Grammar-based compression has become an active field in string compression during the past 20 
years. The idea is to represent a given string s by a small context-free grammar that generates only 
s; such a grammar is also called a straight-line program (SLP). For instance, the word can 

be represented by the SLP with the productions Aq —>■ ab and Ai —>■ Ai-iAi-i for 1 < f < 10 (Aio 
is the start symbol). The size of this SLP (the size of an SLP is usually defined as the total length 
of all right-hand sides of the productions) is much smaller than the length of the string (a&)^°^^. In 
general, an SLP of size n can produce a string of length . Hence, an SLP can be seen indeed as 
a succinct representation of the generated string. The goal of grammar-based string compression 
is to construct from a given input string s a small SLP that produces s. Several algorithms for this 
have been proposed and analyzed. Prominent grammar-based string compressors are for instance 
LZ78, RePair, and BISECTION, see [13] for more details. The theoretically best known polynomial 
time grammar-based compressors [niisiiMiisn] approximate the size of a smallest SLP up to a 
factor O{\og{n/g))^ where g is the size of a smallest SLP for the input string. 

Motivated by applications where large tree structured data occur, like XML processing, gram- 
mar-based compression has been extended to trees (SHnilMllss] , see m for a survey. Unless 
otherwise specified, a tree in this paper is always a rooted ordered tree over a ranked alphabet, 
i.e., every node is labelled with a symbol and the rank of this symbol is equal to the number 
of children of the node. This class of trees occurs in many different contexts like for instance 
term rewriting, expression evaluation, tree automata, and functional programming. A tree over a 
ranked alphabet is uniquely represented by its preorder traversal string. For instance, the preorder 
traversal of the tree f{g{a), f(a, b)) is the string fgafab. It is now a natural idea to apply a string 
compressor to this preorder traversal. In this paper we study the compression of ranked trees by 
SLPs for their preorder traversals. This approach is very similar to [5], where unranked unlabelled 
trees are compressed by SLPs for their balanced parenthesis representations. In |37] this idea is 
used together with the grammar-based compressor RePair to get a new compressed suffix tree 
implementation. 

*The third and fourth author are supported by the DFG-project LO 748/10-1 (QUANT-KOMP). 
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In Section|3]we compare the size of SLPs for preorder traversals with two other grammar-based 
compressed tree representations: the above mentioned SLPs for balanced parenthesis representa¬ 
tions from [5] and (ii) tree straight-line programs (TSLPs) [TUI dH [HI 133] • The latter directly 
generalize string SLPs to trees using context-free tree grammars that produce a single tree, see 
m for a survey. TSLPs generalize dags (directed acyclic graphs), which are widely used as a 
compact tree representation. Whereas dags only allow to share repeated subtrees, TSLPs can 
also share repeated internal tree patterns. In |18] it is shown that every tree of size n over a 
fixed ranked alphabet can be produced by a TSLP of size which is worst-case optimal. 

A grammar-based tree compressor based on TSLPs with an approximation ratio of 0{logn) was 
presented in [^. In cni, it was shown that from a given TSLP A of size m for a tree t one 
can efficiently construct an SLP of size 0{m ■ r) for the preorder traversal of t, where r is the 
maximal rank occurring in t (i.e., the maximal number of children of a node). Hence, a smallest 
SLP for the traversal of t cannot be much larger than a smallest TSLP for t. Our first main result 
(Theorem |7|) shows that SLPs can be exponentially more succinct than TSLPs: We construct a 
family of binary trees {n > 0) such that the size of a smallest SLP for the traversal of tn is 
polynomial in n but the size of a smallest TSLP for tn is H(2"/^). We also match this lower 
bound by an upper bound: Given an SLP A of size m for the traversal of a tree t of height h 
and maximal rank r, one can efficiently construct a TSLP for t of size 0{m ■ h ■ r) (Theorem |8|). 
Finally, we construct a family of binary trees tn (n > 0) such that the size of a smallest SLP for 
the preorder traversal of is polynomial in n but the size of a smallest SLP for the balanced 
parenthesis representation is 17(2"/^) (Theorem |9|). Hence, SLPs for preorder traversals can be 
exponentially more succinct than SLPs for balanced parenthesis representations. It remains open, 
whether the opposite behavior is possible as well. 

We also study algorithmic problems for trees that are encoded by SLPs. We extend some 
of the results from |5] on querying SLP-compressed balanced parenthesis representations to our 
context. Specifically, we show that after a linear time preprocessing we can navigate (i.e., move to 
the parent node and the fc*** child), compute lowest common ancestors and subtree sizes in time 
(!I(logIV), where JV is the size of the tree represented by the SLP (Theorem fTOl) . For a couple 
of other problems (computation of the height and depth of a node, computation of the Horton- 
Strahler number, and evaluation of Boolean expressions) we provide at least polynomial time 
algorithms for the case that the input tree is given by an SLP for the preorder traversal. On the 
other hand, there exist problems that are polynomial time solvable for TSLP-compressed trees but 
difficult for SLP-compressed trees: Examples for such problems are pattern matching, evaluation 
of max-plus expressions, and membership for tree automata. Looking at tree automata is also 
interesting when compared with the situation for explicitly given (i.e., uncompressed) preorder 
traversals. For these, evaluating Boolean expressions (which is the membership problem for a 
particular tree automaton) is NC^-complete by a famous result of Buss [TT], and the NC^ upper 
bound was generalized to every fixed tree automaton |28] . If we compress the preorder traversal 
by an SLP, the problem is still solvable in polynomial time for Boolean expressions 1 Theorem [TUll . 
but there is a fixed tree automaton where the evaluation problem becomes PSPACE-complete 
(Theorem jUSj) . 

Related work on tree compression. There are also tree compressors based on other grammar 
formalisms. In [T] so called elementary ordered tree grammars are used, and a polynomial time 
compressor with an approximation ratio of 0(n^/®) is presented. Also the top dags from [7] can 
be seen as a variation of TSLPs for unranked trees. Recently, in m it was shown that for every 
tree of size n with a many node labels, the top dag has size which improved the 

bound from [7] . An extension of TSLPs to higher order tree grammars was proposed in m- 

Another class of tree compressors use succinct data structures for trees. Here, the goal is to 
represent a tree in a number of bits that asymptotically matches the information theoretic lower 
bound, and at the same time allows efficient querying (ideally in time 0{1)) of the data structure. 
For unlabelled unranked trees of size n there exist representations with 2n + o(n) bits that support 
navigation and some other tree queries in time 0{1) [6l|231|24ll36] . This result has been extended 
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to labelled trees, where (log a) ■n + 2n + o(n) bits suffice when a is the number of node labels [TB] . 

2 Preliminaries 

Let E be a finite alphabet. For a string u> = ai • • ■ a„ G E* we define |r(;| = n, w[i\ = Ui and 
w[i : j] = Gi-'-Qj where w[i : j] = e, ii i > j. Let w[: i] = r(;[l : i] and w[i :] = w[i : |r(;|]. 
With iev{w) = Gn ■ ■ ■ ai we denote w reversed. Given two strings u,v G E*, the convolution 
G (Ex E)* is the string of length min{|M|, |u|} defined by (M(8)n)[i] = (u[i],u[i]) for 1 < i < 
min{|u|, \v\}. 

2.1 Complexity classes 

We assume familiarity with the basic classes from complexity theory, in particular P, NP and 
PSPACE. The following definitions are only needed in Section [5.3.3l The counting class #P contains 
all functions / : E* —>■ N for which there exists a nondeterministic polynomial time machine M 
such that for every a; G E*, f{x) is the number of accepting computation paths of M on input 
X. The class PP (probabilistic polynomial time) contains all problems A for which there exists a 
nondeterministic polynomial time machine M such that for every input x'. x £ A \i and only if 
more than half of all computation paths of M on input x are accepting. By a famous result of 
Toda [41], the class P^^ = P^^ (i.e., the class of all languages that can be decided in deterministic 
polynomial time with the help of an oracle from PP contains the whole polynomial time hierarchy. 
Hence, if a problem is PP-hard, then this can be seen as a strong indication that the problem 
does not belong to the polynomial time hierarchy (otherwise the polynomial time hierarchy would 
collapse). 

The levels of the counting hierarchy Cf {i > 0) are inductively defined as follows: Cq = P 
and = PP'"’ (the set of languages accepted by a PP-machine as above with an oracle from 
€(*) for all i > 0. Let CH = lJj>Q be the counting hierarchy. It is not difficult to show that 
CH C PSPACE, and most complexity theorists conjecture that CH C PSPACE. Hence, if a problem 
belongs to the counting hierarchy, then this can be seen as an indication that the problem is prob¬ 
ably not PSPACE-complete. The counting hierarchy can be also seen as an exponentially blown-up 
version of the circuit complexity class DLOGTIME-uniform TC*^. This is the class of all languages 
that can be decided with a constant-depth polynomial-size circuit family of unbounded fan-in that 
in addition to normal Boolean gates may also use threshold gates. DLOGTIME-uniformity means 
that one can compute in time O{logn) (i) the type of a given gate of the circuit, and (ii) 
whether two given gates of the circuit are connected by a wire. Here, gates of the n**' circuit 
are encoded by bit string of length 0(log n). More details on the counting hierarchy (resp., circuit 
complexity) can be found in |3| (resp., jH]). 

2.2 Trees 

A ranked alphabet is a finite set of symbols where every symbol f G J- has a rank rank(/) G N. 
We assume that T contains at least one symbol of rank zero. By JFn we denote the symbols of 
T of rank n. Later we will also allow ranked alphabets, where Tq is infinite. For the purpose of 
this paper, it is convenient to define trees as particular strings over the alphabet T (namely as 
preorder traversals). The set T{T) of all trees over IF is the subset of T* defined inductively as 
follows: If / G with n > 0 and fi, ..., G T(T), then also jt\ ■ ■ - in G T{F). 

We call a string s G IF* a fragment if there exists a tree t G T{IF) and a non-empty string 
X G such that sx = t. Note that the empty string £ is a fragment. Intuitively, a fragment is 
a tree with gaps. The number of gaps of a fragment s G IF'^ is formally defined as the number 
n of trees ti,... ,tn G T{IF) such that sti • • • G T{IF), and is denoted by gaps(s). The number 
of gaps of the empty string is defined as 0. The following lemma states that gaps(s) is indeed 
well-defined. 

Lemma 1. The following statements hold: 
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Figure 1: The tree t from Example [3] and the tree fragment corresponding to the fragment 
ffaafff. 


• The set T{J-) is prefix-free, i.e. t € T{J-) and tv G T(-F) imply v = e. 

• If t G T{J-), then every suffix oft factors uniquely into a concatenation of strings from 

UT). 

• For every fragment s G J-'^ there is a unique n > 1 such that {a; G J-* \ sx G T(-F)} = 

mnr- 

Since T{IF) is prefix-free we immediately get: 

Lemma 2. For every w G -F* there exist unique n > 0, ti,... ,tn G T{J-) and a unique fragment 
s such that w = ti ■ ■ ■ tnS. 

Let w G IF* and let w = • t^s as in LemmaH) We define c{w) = (n, gaps(s)). The number 

n counts the number of full trees in w and gaps(s) is the number of trees missing to make the 
fragment s a tree, too. 

For better readability, we occasionally write a tree fti ■ • - tn with / G IFn and ti,... ,tn G T{IF) 
as f{ti,... ,tn), which corresponds to the standard term representation of trees. We also consider 
trees in their graph-theoretic interpretation where the set of nodes of a tree t is the set of positions 
{!,..., \t\} of the string t. The root node is 1. If t factorizes as ufti ■ ■ ■ tnV ior u,v G T* , f G Tn, 
and ti,... ,tn G T{T), then the n children of node |u| -I- 1 are |u| -I- 2 -|- X]i=i |ti| for 0 < fc < n — 1. 
We define the depth of a node in t (number of edges from the root to the node) and the height 
of t (maximal depth of a node) as usual. Note that the tree t as a string is simply the preorder 
traversal of the tree t seen in its standard graph-theoretic interpretation. 

Example 3. Let t = f faaf f faaaa = f{f{a, a), f{f{f(a, a), a), a)) be the tree depicted in FigureUi 
with f G T 2 and a G Tq. Its height is 4. All prefixes (including the empty word, excluding the full 
word) oft are fragments. The fragment s = ffaafff is also depicted in Figure\^in a graphical 
way. The dashed edges visualize the gaps. We have gaps(s) = 4. For the factor u = aafffa oft 
we have c{u) = (2,3). The children of node 5 (the third f-labelled node) are 6 and 11. 

2.3 Straight-line programs 

A straight-line program, briefly SLP, is a context-free grammar that produces a single string. 
Formally, it is a tuple A = {N, E, P, S), where iV is a finite set of nonterminals, E is a finite set of 
terminal symbols (E n iV = 0), 5” G iV is the start nonterminal, and P is a finite set of productions 
(or rules) of the form A ^ w ior A G N, w G {N U E)* such that: 

• For every A G N, there exists exactly one production of the form A ^ w, and 

• the binary relation {(A, B) G N x N \ {A ^ w) G P, B occurs in ui} is acyclic. 

Every nonterminal A G N produces a unique string valA(A) G E*. The string defined by A is 
val(A) = valA(<S'). We usually omit the subscript A when the context is clear. The size of the 
SLP A is |A| = transform an SLP A = {N,Ti,P,S) which produces a 

nonempty word in linear time into Chomsky normal form, i.e. for each production (A —^ w) G P, 
either w G S or w = BC where B,C G N. 
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For an SLP A of size n we have |val(A)| € and there exists a family of SLPs A„ (n > 1) 

such that |A„| € 0(n) and |val(A)| = 2". Hence, SLPs allow exponential compression. 

The following lemma summarizes known results about SLPs which we will use throughout the 
paper, see e.g. [50] , 

Lemma 4. There are linear time algorithms for the following problems: 

1. Given an SLP A, compute the set of symbols occurring in val(A). 

2. Given an SLP A with terminal alphabet E and a subset P C E, compute the number of 
occurrences of symbols from P in val(A). 

3. Given an SLP A with terminal alphabet E, a subset P C E, and a number i, compute the 
position of the i*^ occurrence of a symbol from P in val(A) (if it exists). 

4- Given an SLP A and i,j G {1,..., |val(A)|} where i < j, compute an SLP for val(A)[i : j]. 
The size of the SLP for val(A)[i : j] is bounded by C)(|A|). 

2.4 Tree straight-line programs 

We now define tree straight-line programs. Let J- and V be two disjoint ranked alphabets, where we 
call elements from T terminals and elements from V nonterminals. Let further X = {xi,X 2 , ■ •. } 
be a countably infinite set of parameters (disjoint from T and V), which we treat as symbols of 
rank zero. In the following we consider trees over VU ft. The size |t| of such a tree t is defined 

as the number of nodes labelled by a symbol from U V, i.e. we do not count parameter nodes. 
A tree straight-line program A, or short TSLP, is a tuple A = {V, P, P^ S), where V is the set of 
nonterminals, P is the set of terminals. S' G Vq is the start nonterminal and P is a finite set of 
productions of the form A(xi ,..., Xn) t (which is also briefly written as A —>■ t), where n > 0, 
A gVti and t S T(P U V U {a:i,..., Xn}) is a tree in which every parameter Xi {1 < i < n) occurs 
at most once, such that: 

• For every A GVn there exists exactly one production of the form A(xi ,..., x„) —>■ t, and 

• the binary relation {(A, B) G V x V \ (A ^ t) G P, B is a, label in t} is acyclic. 

These conditions ensure that exactly one tree valA(A) G P{PU{xi, ..., Xn}) is derived from every 
nonterminal A € by using the rules as rewriting rules in the usual sense. As for SLPs, we omit 
the subscript A when the context is clear. The tree defined by A is val(A) = valA(S'). The size |A| 
of a TSLP A = (V, P, P, S) is |A| = E(A^t)eP |f|. We call a TSLP monadic if every nonterminal 
has rank at most one. One can transform every TSLP A into a monadic one of size 0(|A| • r), 
where r is the maximal rank of a terminal in A [34) . TSLPs, where every nonterminal has rank 0 
correspond to dags (the nodes of the dag are the nonterminals of the TSLP). 

For a TSLP A of size n we have |val(A)| G and there exists a family of TSLPs A„ (n > 1) 

such that |A„| G 0(n) and |val(A)| = 2". Hence, analogously to SLPs, TSLPs allow exponential 
compression. One can also define nonlinear TSLPs where parameters can occur multiple times 
on right-hand sides; these can achieve doubly exponential compression but have the disadvantage 
that many algorithmic problems become more difficult, see e.g. [32] . 

For every word w (resp., tree t) there exists a smallest SLP (resp., TSLP) A. It is known that, 
unless P = NP, there is no polynomial time algorithm that finds a smallest SLP (resp., TSLP) for 
a given word m (resp. tree). 

3 Checking whether an SLP produces a tree 

In this section we show that, given an SLP A and a ranked alphabet P, we can verify in time 
linear in |A|, whether val(A) G P{P). In other words, we present a linear time algorithm for the 
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compressed membership problem for the language T {J-) C T*. We remark that T{J^) is a context- 
free language, which can be seen by considering the grammar with productions S fS'^ for all 
symbols / G J-n- In general the compressed membership problem for context-free languages can be 
solved in PS PACE and there exists a deterministic context-free language with a PS PACE-complete 
compressed membership problem |121129j . 


Theorem 5. Given an SLP A, one can eheck in time 0{A), whether val(A) G 


Proof. Let A = {N, P, P, S) be in Chomsky normal form and let A G N. Due to Lemma [H 
we know that val(A) is the concatenation of trees and a (possibly empty) fragment. Define 
c{A) := c(val(A)). Then val(A) G T{P) if and only if c{S) = (1,0). Hence, it suffices to compute 
c{A) for all nonterminals A G N. We do this bottom-up. If (A —?> /) G P with / G -F„, then we 
have 


c(A) 


( 1 , 0 ) ifn = 0 
( 0 , n) otherwise. 


Now consider a nonterminal A with the rule (A BC) G P, and let c{B) = ( 61 , 62 ), c(C) = 
(ci,C 2 ). We claim that 


c(A) = 


(61 -I- Cl — max{l, 62 } + 1, C 2 ) 
( 61 , C 2 -I- 62 — Cl — min{l, C 2 }) 


if 62 < Cl 
otherwise. 


Let val(i3) = H • • -tb^s and val(C') = t[ - ■ -t'c^s', where ti,. .. ,tbi,ti, ■ ■ ■ A'd C T'i^) and s (resp., 
s') is a fragment with gaps(s) = 62 (resp., gaps(s') = C2). We distinguish two cases: 

Case 62 < ci: If 62 > 1, then the string st[ ■ ■ - t'l^^ is a tree, and thus val(A) contains 6 i-|-l-|-(ci — 62 ) 
full trees and the fragment s' with C 2 many gaps. On the other hand, if 62 = 0, then val(A) contains 
61 -I- Cl many full trees. 

Case 62 > ci: The trees t[,..., t'^^ fill ci many gaps of s, and if s' 7 ^ e, then the fragment s' fills 
one more gap, while creating another C2 gaps. In total there are 62 — (ci + 1) + C2 gaps if C2 > 0 
and 62 — Cl gaps if C2 = 0. □ 


4 SLPs for traversals versus other grammar-based tree rep¬ 
resentations 

In this section, we compare the worst case size of SLPs for traversals with the following two 
grammar-based tree representations: 

• TSLPs, and 

• SLPs for balanced parenthesis sequences [S] . 

4.1 SLPs for traversals versus TSLPs 

In [To] it is shown that a TSLP A producing a tree t G T{P) can always be transformed into 
an SLP of size 0(1 A| • r) producing t, where r is the maximal rank of a label occurring in t. So, 
for binary trees the size at most doubles. In this section we will discuss the other direction, i.e. 
transforming an SLP into a TSLP. Let a be a symbol of rank 0 and let /„ be a symbol of rank n for 
each n G N. Now let be the tree /««" and consider the family of trees {tn)nen with unbounded 
rank. The size of the smallest TSLP for is n -|- 1, whereas the size of the smallest SLP for 
is in O(logn). It is less obvious that such an exponential gap can be also realized with trees of 
bounded rank. In the following we construct a family of binary trees ( 6 „)ra 6 N where a smallest 
TSLP for tn is exponentially larger than the size of a smallest SLP for Afterwards we show 
that it is always possible to transform an SLP A for t into a TSLP of size 0(|A| ■ h-r) for t, where 
h is the height of t and r is the maximal rank of a label occurring in t. 
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Figure 2: The comb tree t{u, v) for u = ii ■■ ■ in and v = ji - ■ ■ jn 


4.1.1 Worst-case comparison of SLPs and TSLPs 

We use the following result from [S] for the previously mentioned worst-case construction of a 
family of binary trees: 

Theorem 6 (Thm. 2 from j^). For every n > 0, there exist words Un, Vn € {0,1}* with |u„| = \vn\ 
such that Un and have SLPs of size but the smallest SLP for the convolution 

has size 12 ( 2 " 72)0 

For two given words u = ii ■ ■ • in & {0,1}* and v = ji • ■ • jn G {0,1}* we define the comb tree 

t{u,v) = f^,{f^ 2 {■ ■•/»„($,in) • ■ •i 2 ),il) 

over the ranked alphabet {/o,/i, 0,1, $} where /o,/i have rank 2 and 0,1,$ have rank 0. See 
Figure [5] for an illustration. 

Theorem 7. For every n > 0 there exists a tree tn such that the size of a smallest SLP for tn is 
polynomial in n, but the size of a smallest TSLP for tn is in 12(2"/^). 

Proof. Let us fix an n and let Un and Vn be the aforementioned strings from Theorem [6l Let 
iMral = Ir'nl = Consider the comb tree tn '■= t(un,Vn). Note that tn = fi^ • • • 
where Un = ii - • ■ im- By Theorem | 6 ] there exist SLPs of size for and Vn, and these SLPs 
easily yield an SLP of size for tn- 

Next, we show that a TSLP A for tn yields an SLP of size 0(|A|) for the string Un Avn- Since 
a smallest SLP for it„ Au„ has size r2(2"/^) by TheoremjSl the same bound must hold for the size 
of a smallest TSLP for 

Let A be a TSLP for By [33] we can transform A into a monadic TSLP A' for of size 
(!2(|A|). We transform the TSLP A' into an SLP of the same size for ® We can assume 
that every nonterminal except for the start nonterminal S occurs in a right-hand side and every 
nonterminal occurs in the derivation starting from S. At first we delete all rules of the form 
^ J 0 S {0,1}) and replace the occurrences of A by j in all right-hand sides. Now every 
nonterminal A S' of rank 0 derives to a subtree of tn that contains the unique S-leaf of 
Hence, tn contains a unique subtree val(A). This implies that A occurs exactly once in a right 
hand side. We can therefore without size increase replace this occurrence of A by the right-hand 
side of A. After this step, S is the only rank-0 nonterminal in the TSLP. With the same argument, 
we can also eliminate rank -1 nonterminals that derive to a tree containing the unique leaf S. After 
this step, every rank -1 nonterminal A{x) derives a tree of the form gi{g 2 i- ■ ■ {gk{x,jk) ■ ■ ■ )>^ 2 )) ji) 
{di e {/o, /i} and ji € { 0 , 1 }). 

Now, if a right-hand side contains a subtree /i(si, S 2 ), then S 2 must be either 0 or 1. Similarly, 
for every occurrence of 7 G { 0 , 1 } in a right-hand side, the parent node of that occurrence must 
be either labelled with /o or fi (note that the parent node exists and cannot be a nonterminal). 
Therefore we can obtain an SLP for (8) by replacing every production A(x) t{x) by 
A A(t(x)), where A(t(x)) is the string obtained inductively by A(x) = e, A{B{s{x)) = BA{s{x)) 
for nonterminals B, and A(/i(s(x), j)) = {i, j)A(s{x)). The production for S must be of the form 
S —>• t(S) for a term t{x) and we replace it by S' —>• A(t(x))$. □ 

^Actually, in the result is not stated for the convolution Un ^ v„ but the literal shuffle of u„ and Vn. which 
is u^[l]i;7i[l]u7j,[2]i;Ti[2] • • • Un[m]vn[m]. But this makes no difference, since the sizes of the smallest SLPs for the 
convolution and literal shuffle, respectively, of two words differ only by multiplicative constants. 
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4.1.2 Conversion of SLPs to TSLPs 


Note that the height of the tree in Theorem[7]is linear in the size of By the following result, 
large height and rank are always responsible for the exponential succinctness gap between SLPs 
and TSLPs. 

Theorem 8. Let t G T(.F) be a tree of height h and maximal rank r, and let A he an SLP for t 
with |A| = TO. Then there exists a TSLP B with val(B) = t sueh that |B| G 0{m ■ h ■ r), which can 
be constructed in time 0{m ■ h ■ r). 

Proof. Without loss of generality we assume that A is in Chomsky normal form. For every non¬ 
terminal A of A with c{A) = ( 01 , 02 ) we introduce oi nonterminals Ai,..., A^^ of rank 0 (these 
produce one tree each) and, if 02 > 0, one nonterminal A' of rank 02 for the fragment encoded 
by A. For every rule of the form A ^ f with / G Tn we add to B the TSLP-rule Ai —>• / if 
n = 0 or A'(a;i,..., Xn) —t f{xi, ..., Xn) if n > 1. Now consider a rule of the form A —>■ BC with 
c{B) = (&i,& 2 ) and c{C) = (ci,C 2 ). 

Case 1: If 62 = 0 we add the following rules to B: 


Ai ^ Bi for 1 < z < bi 

Abj^+i Ci for 1 < z < Cl 

A'(a;i,... ,Xc 2 ) ^ C'(xi , . . . , Xc2 ) if C 2 > 0 

Case 2 : If 0 < 62 < ci we add the following rules to B: 

Ai — 5 > Bi for 1 < z < 61 

■^bi+i B' {Cl ,..., Cb 2 ) 

Ab^+i+i Cb 2 +i for 1 < z < Cl - &2 

A'(xi,... ,Xc 2 ) C'{xi,... ,Xc 2 ) if C 2 > 0 

Case 3: If 62 > Ci we add the following rules to B, where d = b 2 — Ci. 


Ai —)■ Bi for 1 < z < hi 

A'(xi,... ,Xd) ->• B'{Ci, ... ,Cci,xi, ...,Xd) if C2 = 0 

A'(xi,..., Xc2+d—l )^B'{Ci,...,Cc„C'{xi , . . . , Xc2 ) ; Xc2 + 1: ■ ■ ■> Xc2+d—l ) if C2 > 0 

Chain productions, where the right-hand side consists of a single nonterminal, can be eliminated 
without size increase. Then, only one of the above productions remains and its size is bounded by 
Cl -|- 2 (recall that we do not count parameters). Recall that ci is the number of complete trees 
produced by C. It therefore suffices to show that the number of complete trees of a factor s of t 
is bounded by h • r, where h is the height of t and r is the maximal rank of a label in t. Assume 
that s = t[i : j] = ti- ■ -tns', where ti G ^{B) and s' is a fragment. Let k be the lowest common 
ancestor of z and j. If fc = z (i.e., z is an ancestor of j) then either s = ti oi s = s'. Otherwise, 
the root of every tree ti {1 < I < n) is a child of a node on the path from z to k. The length of 
the path from z to /c is bounded by h, hence n < h ■ r. □ 

4.2 SLPs for traversals versus balanced parenthesis sequences 

Balanced parenthesis sequences are widely used as a succinct representation of ordered unranked 
unlabeled trees [55]. One defines the balanced parenthesis sequence bp(t) of such a tree t induc¬ 
tively as follows. If t consists of a single node, then bp(<) = (). If the root of t has n children 
in which the subtrees ti,... are rooted (from left to right), then bp(t) = (bp(ti) • • • bp(<„)). 
Hence, a tree with n nodes is represented by 2n bits, which is optimal in the information theoretic 
sense. On the other hand, an unlabelled full binary tree t (i.e., a tree where every non-leaf node 


8 



has exactly two children) of size n can be represented with n bits by viewing t as a ranked tree 
over J- = {a, / }, where / has rank two and a has rank zero. 

Theorem 9. For every n > 0 there exists a full binary tree t^ such that the size of a smallest 
SLP for tn is polynomial in n, hut the size of a smallest SLP for bp(t„) is in r2(2"/^). 

Proof. Let us fix an n and let G {0,1}* be the strings from Theorem [51 Let \un\ = \vn\ = m. 

We define tn by 

tn = (pi{Tev{Un))a(p2iVn) 

where ipi, (p 2 ■ {0,1}* —>■ {a, /}* are the homomorphisms defined as follows: 

= f (p2i0) = a 

V5i(l)=/a/ V?2(l) = /aa 

It is easy to see that is indeed a tree (note that the string (p 2 {vn) is a sequence of m many 
trees). From the SLPs for and we obtain an SLP for of size polynomial in n. It remains 
to show that the smallest SLP for hp[tn) has size 11(2"'/^). To do so, we show that from an SLP 
for hpftn) we can obtain with a linear size increase an SLP for the convolution of and In 
fact, we show the following claim: 

Claim. The convolution ® can be obtained from a suffix of hpftn) by a fixed rational 
transformation (i.e., a deterministic finite automaton that outputs along every transition a finite 
word over some output alphabet). 

This claim proves the theorem using the following two facts: 

• An SLP for a suffix of a string val(A) (for an SLP A) can be produced by an SLP of size 
(!I(|A|) by point 4 of LemmalU 

• For every fixed rational transformation p, an SLP for p(val(A)) can be produced by an SLP 
of size 0(|A|) [SJ Theorem 1] (the O-constant depends on the rational transformation). 

To see why the above claim holds, it is the best to look at an example. Assume that Un = 10100 
and Vn = 10010. Hence, we have 

tn = (pi{Tev{Un))a(p2iVn) = f f fof f fof Q fGO O Q fGO Q. 
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This tree is shown in Figure |31 We have 


bp(t„) = ((^ ( ^ 0 (OOW ^ ^ (00» ^• 

1 1 ( 1 , 1 ) ( 0 , 0 ) ( 1 , 0 ) ( 0 , 1 ) ( 0 , 0 ) 


Indeed, bp(t„) starts with an encoding of the string rev(u„) (here 00101) via the correspondence 
0 = ( and 1 = (()(, followed by () (which encodes the single a between (pi(rev(un)) and ip 2 (i’n) 
in t„), followed by the desired encoding of the convolution u„ <S>Vn- The latter is encoded by the 
following correspondence: 


( 0 , 0 ) 

= 0 ) 

( 1 , 0 ) 

= ())) 

( 0 , 1 ) 

= (()())) 

( 1 , 1 ) 

= (()()))) 


So, a 0 (resp., 1) in the second component is encoded by () (resp., (()())), which corresponds to 
the tree a (resp., faa). A 0 (resp., 1) in the first component is encoded by one (resp., two) closing 
parenthesis. 

Note that the strings ()), ())), (()())), (()()))) form a prefix code. This allows to replace these 
strings by the convoluted symbols (0, 0), (1, 0), (0,1), and (1,1), respectively, by a deterministic 
rational transducer. This shows the above claim. □ 

Theorem [5] can be also interpreted as follows: For every n > 0 there exists a full binary tree 
tn such that the size of the smallest SLP for the depth-first-unary-degree-sequence (DFUDS - it 
is defined in the proof of Theorem [10] below) of is polynomial in n, but the size of the smallest 
SLP for the balanced parenthesis representation of tn is in 11(2"'/^). It remains open, whether 
there is also a tree family where the opposite situation arises. 

5 Algorithmic problems on SLP-compressed trees 

In this section we study the complexity of several basic algorithmic problems on trees that are 
represented by SLPs. 

5.1 Efficient tree operations 

In [S] it is shown that for a given SLP A of size n that produces the balanced parenthesis rep¬ 
resentation of an unranked tree t of size N, one can produce in time 0(n) a data structure of 
size 0{n) that supports navigation as well as other important tree queries (e.g. lowest common 
ancestors queries) in time 0{\ogN). Here, the word RAM model is used, where memory cells can 
store numbers with log N bits and arithmetic operations on log A^-bit numbers can be carried out 
in constant time. An analogous result was shown in [THU] for top dags. Here, we show the same 
result for SLPs that produce (preorder traversals of) ranked trees. Recall that we identify the 
nodes of a tree t with the positions 1,..., |t| in the string t. 

Theorem 10. Given an SLP A of size n for a tree t S T{iF) of size N, one can produce in 
time 0(n) a data structure of size 0{n) that allows to do the following computations in time 
0{logN) < 0{n) on a word RAM, where i,j,k € N with 1 < i, j < N are given in binary 
notation: 

(a) Compute the parent node of node i > 1 in t. 

(b) Compute the child of node i in t, if it exists. 

(c) Compute the number k such that i > 1 is the child of its parent node. 
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(d) Compute the size of the subtree rooted at node i. 

(e) Compute the lowest common ancestor of nodes i and j in t. 

Proof. In [S], it is shown that for an SLP A of size n that produces a well-parenthesized string 
S {(:)}* of length N, one can produce in time 0{n) a data structure of size 0{n) that allows 
to do the following computations in time 0(log A^) on a word RAM, where 1 < k,j < N are given 
in binary notation and b G {(,)}: 

• Compute the number of positions 1 < i < k such that = b (rankh(A:)). 

• Compute the position of the k^^ occurrence of 6 in ic if it exists (select;,(A:)). 

• Compute the position of the matching closing (resp., opening) parenthesis for an opening 
(resp., closing) parenthesis at position k (findclose(A:) and findopen(A:)). 

• Compute the left-most position i S [k,j] having the smallest excess value in the interval 
[k,j], where the excess value at a position i is rank((j) — rank)(j) (rmqi(fc,j)). 

Let us now take an SLP A of size n for a tree t G T{J-) of size N and let s be the corre¬ 
sponding unlabelled tree. In [B], the DFUDS-representation (DFUDS for depth-first-unary-degree- 
sequence) of s is defined as follows: Walk over the tree in preorder and write down for every node 
with d children the string (‘^) (d opening parenthesis followed by a closing parenthesis). Finally 
put an additional opening parenthesis at the beginning of the resulting string, which yields a 
well-parenthesized string. For instance, for the tree g{f{a,a),a,h{a)) we obtain the DFUDS- 
representation ( ((()(() ) ) ) () ). Clearly, from the SLP A we can produce an SLP B for the 
DFUDS-representation of the tree s: Simply replace in right-hand sides every occurrence of a 
symbol / of rank d by (‘^), and add an opening parenthesis in front of the right-hand side of the 
start nonterminal. 

The starting position of the encoding of a node i G {I,...,7V} in the DFUDS-representation 
can be found as select) (i — I) -(- 1 for z > I, and for i = I it is 2. Vice, versa if k is the starting 
position of the encoding of a node in the DFUDS-representation, then the preorder number of 
that node is rank)(fc — I) -|- I. 

In 0 124) , it is shown that the tree navigation operations from the theorem can be implemented 
on the DFUDS-representation using a constant number of rank, select, findclose(fc), findopen(fc) and 
rmqi-operations. Together with the above mentioned results from [8] this shows the theorem. □ 

The data structure of [8] allows to compute the depth and height of a given tree node in time 
C>(logV) as well. It is not clear to us, whether this result can be extended to our setting as 
well. In [24] it is shown that the depth of a given node can be computed in constant time on the 
DFUDS-representation. But this uses an extra data structure, and it is not clear whether this extra 
data structure can be adapted so that it works for an SLP-compressed DFUDS-representation. 
On the other hand, in Section [Ql we show that the height and depth of a given node of an 
SLP-compressed tree can be computed in polynomial time at least. 

5.2 Pattern matching 

In contrast to navigation problems, simple pattern matching problems become quite difficult for 
SLP-compressed trees. The pattern matching problem for SLP-compressed trees can be formalized 
as follows: Given a tree s G T{TUX), called the pattern, where every parameter x G X occurs at 
most once, and an SLP A producing a tree t G T{X), is there a substitution a : X ^ such 

that ct(s) is a subtree of tl Here, cr(s) G T{X) denotes the tree obtained from s by substituting 
each variable x G X hy the tree a{x). Note that the pattern is given in uncompressed form. If the 
tree t is given by a TSLP, the corresponding problem can be solved in polynomial time [40) (even 
if the pattern tree s is given by a TSLP as well). 

Theorem 11. The pattern matching problem for SLP-compressed trees is HP-complete. Moreover, 
NP-hardness holds for a fixed pattern of the form f(x, a) 
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Proof. The problem is contained in NP because one can guess a node i S and verify 

whether the subtree of t rooted in i matches the pattern s. The verification is possible in polynomial 
time by comparing all relevant symbols using Theorem 1101 

By [301 Theorem 3.13] it is NP-complete to decide for given SLPs A, B over {0,1} with |val(A)| = 
|val(B)| whether there exists a position i such that val(A)[f] = val(]B)[i] = 1. This question can be 
reduced to the pattern matching problem with a fixed pattern. One can compute in polynomial 
time from A and B an SLP T for the comb tree t(val(A), val(B)). There exists a position i such 
that val(A)[i] = val(B)[i] = 1 if and only if the pattern fi{x, 1) occurs in t(val(A), val(B)). □ 

5.3 Tree evaluation problems 

The algorithmic difficulty of SLP-compressed trees already becomes clear when computing the 
height. For TSLPs it is easy to see that the height of the produced tree can be computed in linear 
time: Compute bottom-up for each nonterminal the height of the produced tree and the depths of 
the parameter nodes. However, this direct approach fails for SLPs since each nonterminal encodes 
a possibly exponential number of trees. The crucial observation to solve this problem is that one 
can store and compute the required information for each nonterminal in a compressed form. 

In the following we present a general framework to define and solve evaluation problems on SLP- 
compressed trees. We assign to each alphabet symbol of rank n an n-ary operator which defines 
the value of a tree by evaluating it bottom-up. This approach includes natural tree problems like 
computing the height of a tree, evaluating a Boolean expression or determining whether a fixed 
tree automaton accepts a given tree. We only consider operators on Z but other domains with an 
appropriate encoding of the elements are also possible. To be able to consider arbitrary arithmetic 
expressions properly, it is necessary to allow the set of constants of a ranked alphabet T to be 
infinite, i.e. J-q C Z. 

Definition 12. Let “D G Z be a (possibly infinite) domain of integers and let T be a ranked 
alphabet with Tq = T). An interpretation X of T over assigns to each function symbol f € Tn an 
n-ary function f^ : I?" —>■ D with the restriction that = a for all a G "D. We lift the definition 
of I to T{T) inductively by 

ifh---tnf = fiti,...,tl), 
where f G Tn and ,..., G T (X). 

Definition 13. The X-evaluation problem for SLP-compressed trees is the following problem: 
Given an SLP A over T with val(A) G T{iF), compute val(A)^. 

5.3.1 Reduction to caterpillar trees 

In this section, we reduce the X-evaluation problem for SLP-compressed trees to the corresponding 
problem for SLP-compressed caterpillar trees. A tree t G is called a caterpillar tree if every 

node has at most one child which is not a leaf. Let s G X* be an arbitrary string. Then s^ G T* 
denotes the unique string obtained from s by replacing every maximal substring t G T{iF) of s by 
its value t^. By Lemma [3] we can factorize s uniquely as s = H • -tnU where ti,... G T^iF) 
and u is a fragment. Hence = mi • • • mnvF with mi,..., m^ G X>. Since it is a fragment, the 
string ufi is the fragment of a caterpillar tree (briefly, caterpillar fragment in the following). 

Example 14. Let T = {0,1,2,-|-, x} with the standard interpretation on integers (-G and x are 
considered as binary operators). Consider s = 02-|-2-|--l-x2-l-21-|-x. Since -1-21 evaluates to 
3, and x23 evaluates to 6, we have s^ = 02-|-2-|--|-6-|- x. 

Our reduction to caterpillar trees only works for interpretations that satisfy a certain growth 
condition. We say that an interpretation X is polynomially bounded, if there exist constants a, /3 > 0 
such that for every tree t G T(X) (we denote the absolute value of an integer by 2 by abs(z) instead 
of \z\ in order to get not confused with the size |t| of a tree), 

abs(t^) < ip fit] + ^ abs(t[i]) 

V iei 
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where L C {1,..., |t|} is the set of leaves of t. The purpose of this definition is to ensure that 
for every SLP A with val(A) G T(-A), both the length of the binary encoding of val(A)^ and the 
integer constants that appear in A are polynomially bounded in |A|. 

Theorem 15. Let I he a polynomially bounded interpretation. Then the I-evaluation problem for 
SLP-compressed trees is polynomial time Turing-reducible to the X-evaluation problem for SLP- 
compressed caterpillar trees. 

Proof. In the proof we use an extension of SLPs by the cut-operator, called composition systems. 
A composition system A = (N,T,, P, S) is an SLP where P may also contain rules of the form 
A —)• B[i : j] where A, B G N and i,j > 0. Here we let val(A) = val(i?)[i : j]. It is known [TO] (see 
also m) that a given composition system can be transformed in polynomial time into an SLP 
with the same value. One can also allow mixed rules A -G Xi ■ ■ ■ A„ where each Xi is either a 
terminal, a nonterminal or an expression of the form B[i : j], which clearly can be eliminated in 
polynomial time. 

Let A = {N, X, P, S) be the input SLP in Chomsky normal form. We use the notation c{A) = 
c(val(A)) as in the proof of Theorem [SJ We will compute a composition system where for each 
nonterminal A G N there are nonterminals Ai and A 2 in the composition system such that the 
following holds: Assume that val(A) = s, where ti,... ,tn S and s is a fragment. 

Hence, c{A) = (n,gaps(s)). Then we will have 

• val(Ai) = ti ■ ■ -f^ GT>*, and 

• val(A 2 ) = s^. 

In particular, val(Ai)val(A 2 ) = val(A)^ and val(A)^ is given by the single number in val(5'i). 

The computation is straightforward for rules of the form A —>■ / with A G N and f G X: If 
rank(/) = 0, then val(Ai) = / and val(A 2 ) = £. If rank(/) > 0, then val(Ai) = £ and val(A 2 ) = /. 

For a nonterminal A G N with the rule A BC we make a case distinction depending on 
c{B) = ( 61 , 62 ) and c{C) = (ci,C 2 ). 

Case 62 < Ci: Then concatenating val(H) and val(C') yields a new tree tnew (or £ if 62 = 0) in 
val(A). Note that is the value of the tree val(i? 2 ) val(C'i)[l : 62 ]. Hence we can compute 
in polynomial time by computing an SLP that produces val(i? 2 ) va^Ci))! : 62 ] and querying the 
oracle for caterpillar trees. We add the following rules to the composition system: 

Ai -G Bi Cl [62 -b 1 : Cl] 

A 2 —^ C 2 

Case 62 > ci: Then all trees and the fragment produced by C are inserted into the gaps of the 
fragment encoded by B. If ci = 0 (i.e., val(Ci) = £), then we add the productions Ai Bi and 
A 2 —>■ B 2 C 2 . Now assume that ci > 0. Consider the fragment 

s = val(H 2 ) val(Ci) val(C 2 ). 

Intuitively, this fragment s is obtained by taking the caterpillar fragment val(H 2 ), where the first 
Cl many gaps are replaced by the constants from the sequence val(Ci) and the (ci -b 1)®* gap is 
replaced by the caterpillar fragment val(C 2 ), see Figure [4] If s is not already a caterpillar fragment, 
then we have to replace the (unique) largest factor of s which belongs to T {X) by its value under 
X to get s^. To do so we proceed as follows: Consider the tree t' = val(H 2 ) val(Ci) o^^-ci ^ where 
o is an arbitrary symbol of rank 0, and let r = |val(i? 2 )| -b Ci -b 1 (the position of the first o in t'). 
Let q be the parent node of r, which can be computed in polynomial time by Theorem 1101 Using 
LemmaOwe compute the position p of the first occurrence of a symbol in t'[q-\- 1 :] with rank > 0. 
If no such symbol exists, then s is already a caterpillar fragment and we add the rules Ai Bi 
and A 2 -G B 2 C 1 C 2 to the composition system. Otherwise p is the first symbol of the largest factor 
from X{X) described above. Using Theorem ITOr dL we can compute in polynomial time the last 
position p' of the subtree of t' that is rooted in p. Note that the position p must belong to val(H 2 ) 
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Inserted trees: a, b, a, a 
Inserted fragment: faf 

f . 



Figure 4: An example for case 2 in the proof of Theorem [151 In the left fragment we insert the 
trees a, b, a, a and the fragment faf. The latter yield, together with a part of the fragment, a new 
tree feat- 


and that p' must belong to val(C'i) (since Ci > 0). The string teat = (val(i? 2 ) val(C'i))[p : p'] is 
a caterpillar tree for which we can compute an SLP in polynomial time by the above remark on 
composition systems. Hence, using the oracle we can compute the value We then add the 
rules 

Ai —>■ Bi, 

A' —>• B 2 C 1 , and 

A 2 —>■ A'[. p — l]t^^^A'[p'-\-1 ■]C2 

to the composition system. This completes the proof. □ 

5.3.2 Polynomial time solvable evaluation problems 

Next, we present several applications of Theorem [121 We start with the height of a tree. 

Theorem 16. The height of a tree t G T{fF) given by an SLP A is computable in polynomial 
time. 

Proof. We can assume that t is not a single constant. We replace every symbol in Pq by the integer 
0. Then, the height of t is given by its value under the interpretation I with /^(oi,..., a„) = 
1 + max{ai,..., a„} for symbols / € Pn with n > 0. Clearly, I is polynomially bounded. By 
Theorem M it is enough to show how to evaluate a caterpillar tree t given by an SLP A in 
polynomial time under the interpretation I. But note that in this caterpillar tree, arbitrary 
natural numbers may occur at leaf positions. 

Let Vt = {d G N \ d labels a leaf of t}. The size of this set is bounded by |A|. For d G Vt let 
Vd be the largest (i.e., deepest) node such that d is the label of a child of node Vd (in particular, 
Vd is not a leaf). Let us first argue that Vd can be computed in polynomial time. 

Let k be the maximal position in t where a symbol of rank larger than zero occurs. The 
number k is computable in polynomial time by Lemma 0] (point [5] and |31) . Again using Lemma 0] 
we compute the position of d’s last (resp., first) occurrence in t[: k] (resp., t[k + 1 :]). Then using 
Theorem m we compute the parent nodes of those two nodes in t and take the maximum (i.e., 
the deeper one) of both. This node is Vd. 

Assume that Vt = {di,..., dm}, where w.l.o.g. Vd^ < Vd 2 < ■ • ■ < Vd^ (if Vdi = Vd^ for dj < dj, 
then we simply ignore di in the following consideration). Note that is the maximal position 
in t where a symbol of rank larger than zero occurs (called k above). Let U be the subtree rooted 
at Vdi. Then = dm + 1- We now claim that from the value tf_^_i we can compute in polynomial 
time the value . The crucial point is that we can ignore all constants that appear in the interval 
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[vdi + — 1] except for di. More precisely, assume that a = and let b be the number of 

occurrences of symbols of rank at least one in the interval [vd^ + “ !]• Also this number 

can be computed in polynomial time by Lemma SI Then the value of is max{a + &+ l,cii + l}. 
Finally, using the same argument, we can compute from □ 

Corollary 17. Given an SLP A for a tree t and a node 1 < J < |t| one can compute the depth of 
i in t in polynomial time. 

Proof. We can write t as t = uvw, where |u| = i — 1 and v is the subtree of t rooted at node i. 
We can compute in polynomial time |u| by Theorem [TUI This allows to compute in polynomial an 
SLP for the tree Here, h has rank one and a has rank zero. Then the depth of j in t is 

height(M/il‘lai(;) — \t\. □ 

An interesting parameter of a tree t is its Horton-Strahler number or Strahler number, see 
m for a recent survey. It can be defined as the value t^ under the interpretation I over N 
which interprets constant symbols a € J-q hy = 0 and each symbol / G Tn with n > 0 as 
follows: Let oi,..., a„ G N and a = max{ai,..., a„}. We set /^(ai,..., «„) = a if exactly one 
of oi ,... ,an is equal to a, and otherwise /^(oi,..., a„) = a + 1. The Strahler number was first 
defined in hydrology, but also has many applications in computer science m , e.g. to calculate 
the minimum number of registers required to evaluate an arithmetic expression m- 

Theorem 18. Given an SLP A for a tree t, one can compute the Strahler number oft in polynomial 
time. 

Proof. Note that the interpretation I above is very similar to the one from the proof of Theorem llhl 
The only difference is that the uniqueness of the maximum among the children of a node also affects 
the evaluation. Therefore the proof of Theorem [16] must be slightly modified by considering for 
each d gN occurring in t the two deepest leaves in t labelled with d (or the unique leaf labelled by 
d if d occurs exactly once). Let i and j be the parents of those two leaves {i > j) and let ti (resp., 
tj) be the subtree of t rooted at i (resp., j). The nodes i and j can be computed in polynomial 
time as in the proof of Theorem [161 We have tf > d, and therefore tJ = d + 1. This implies 
that any further occurrence of d that is higher up in the tree has no influence on the evaluation 
process. The rest of the argument is similar to the proof of Theorem [TBl □ 

If the interpretation X is clear from the context, we also speak of the problem of evaluating 
SLP-compressed X-trees. In the following theorem the interpretation is given by the Boolean 
operations A and V over {0,1}. 

Theorem 19. Evaluating SLP-compressed {A,V,0, l}-trees can be done in polynomial time. 

Proof. Let A be an SLP over {A,V,0,1} such that val(A) is a caterpillar tree. Define a left 
caterpillar tree to be a tree of the form uv, where u G {A, V}*, v G {0,1}* and |r!| = |m| + 1. That 
means that the main branch of the caterpillar tree grows to the left. The evaluation of val(A) is 
done in two steps. In a first step, we compute in polynomial time from A a new SLP B such that 
B is a left caterpillar tree and val(A)^ = val(B)^. In a second step, we show how to evaluate a left 
caterpillar tree. We can assume that val(A) is neither 0 or 1. 

Step 1. (See Figure |S| for an illustration of step 1.) Since val(A) is a caterpillar tree, we have 
val(A) = uv with u G {A, V, AO, Al, VO, Vl}* • {A, V}, v G {0,1}* and |u| is 1 plus the number 
of occurrences of the symbols A, V in it that are not followed by 0 or 1 in it. We can compute 
bottom-up the length of the maximal suffix of val(A) from {0,1}* in polynomial time. Hence, 
by Lemma m we can compute in polynomial time SLPs Ai and A 2 such that val(Ai) = it and 

val(A2) = V. 

We will show how to eliminate all occurrences of the patterns AO, Al, VO, VI. For this, it is 
technically easier to replace every occurrence of oa by a new symbol o^, where o G {A, V} and 
a G (0,1}. Let ip : {A, V, AO, Al, VO, Vl}* —{A, V, Aq, Ai, Vq, Vi}* be the mapping that replaces 
every occurrence of oa by the new symbol (o g {A, V}, a G {0,1}). This mapping is a rational 
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Figure 5: An example for step 1 in the proof of Theorem 1191 In the first image we find the 
expression AO, hence we remove the remaining suffix. The expression VO can also be removed 
without changing the final truth value. 


transformation. Hence, using 0 Theorem 1], we can compute in polynomial time an SLP Bi for 
(^(val(Ai)). We now compute, using Lemma HI the position i in val(Bi) of the first occurrence of 
a symbol from {Aq, Vi}. Next, we compute an SLP Ci for the prefix val(Bi)[: i — 1], i.e., we cut 
off the suffix starting in position i. Moreover, we compute the number j of occurrences of symbols 
from {a, V} in the suffix val(Bi)[z :] and compute an SLP B 2 for the string 0 val(A 2 )[j + 2 :] in case 
val(Bi)[i] = Ao and 1 val(A 2 )[j + 2 :] in case val(Bi)[i] = Vi. Then val(A) evaluates to the same 
truth value as (/3“^(val(Ci)) val(B 2 ). The reason for this is that (p“^(val(Bi)[i :]) val(A 2 )[: j + 1] is 
a tree which evaluates to 0 (resp., 1) if val(Bi)[i] = Aq (resp., val(Bi)[i] = Vi), because 0 A a; = 0 
(resp., 1 V a; = 1). 

Note that (/?“^(val(Ci)) val(B 2 ) is a caterpillar tree, where val(Ci) G {A, V, Ai,Vo}* and 
val(B 2 ) G {0,1}*. Since 1 A a; = x (resp., 0 V x = x), we can delete in the string val(Ci) all 
occurrences of the symbols Ai and Vq without changing the final truth value. Let Di be an SLP 
for the resulting string, which is easy to compute from Ci. Then val(Di) val(B 2 ) is indeed a left 
caterpillar tree. 

Step 2. To evaluate a left caterpillar tree let Ai and A 2 be two SLPs where val(Ai) G {A, V}*, 
val(A 2 ) G {0,1}*, and |val(A 2 )| = |val(Ai)| + 1. Let Lp : {A, V}* —{0,1}* be the homomorphism 
with i^(A) = 1 and (^(V) = 0. Using binary search, we compute the largest position i such that the 
reversed length-f suffix of val(A 2 ) is equal to the length-j prefix of (^(val(Ai)). If * = |val(Ai)|, then 
the value of val(Ai) val(A 2 ) is the first symbol of val(A 2 ). Otherwise, the value of val(Ai) val(A 2 ) 
is 0 (resp., I) if val(Ai)[f + I] = A (resp., val(Ai)[i + I] = V). □ 

Corollary 20. If the interpretation X is such that (T>, A^, V^) is a finite distributive lattice, then 
the X-evaluation problem for SLP-compressed trees can be solved in polynomial time. 

Proof. By Birkhoff’s representation theorem, every finite distributive lattice is isomorphic to a 
lattice of finite sets, where the join (resp., meet) operation is set union (resp., intersection). This 
lattice embeds into a finite power of ({0,1}, A, V). □ 

5.3.3 Difficult arithmetical evaluation problems 

Assume that X is the interpretation that assigns to the symbols + and x their standard meaning 
over the integers. Note that this interpretation is not polynomially bounded. For instance, for the 
tree tn = x"(2)"'+^ we have t^ = 2"+^. Hence, if a tree t is given by an SLP A, then the number 
of bits of t^ can be exponential in the size of A. Therefore, we cannot write down the number f^ 
in polynomial time. The same problem arises already for numbers that are given by arithmetic 
circuits (circuits over + and x). 

In [ 3 ] it was shown that the problem of computing the fc*** bit (fc is given in binary notation) 
of the number to which a given arithmetic circuit evaluates to belongs to the counting hierarchy. 
An arithmetic circuit can be seen as a dag that unfolds to an expression tree. Dags correspond to 
TSLPs where all nonterminals have rank 0. Vice versa, it was shown in [18] that a TSLP A over 
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+ and X can be transformed in logspace into an arithmetic circuit that evaluates to val(A)^. This 
transformation holds for any semiring. Thus, over semirings, the evaluation problems for TSLPs 
and circuits (i.e., dags) have the same complexity. In particular, the problem of computing the 
bit of the output value of a TSLP-represented arithmetic expression belongs to the counting 
hierarchy. Here, we show that this result even holds for arithmetic expressions that are given by 
SLPs: 

Theorem 21. The problem of computing for a given binary encoded number k and an SLP A 
over {+, x}UZ the bit o/val(A)^ belongs to the eounting hierarchy. 

Proof. We follow the strategy from [S] proof of Thm. 4.1]. Let A be the input SLP for the tree 
t and let N = T{t). Then iV < 2^ where n = |A| (this follows since the expression t has size at 
most 2" and the value computed by an expression of size m is at most 2™). Let Pn be the set of 
all prime numbers in the range [2,2^"] (note that 2^" > log^iV). Then Also note 

that each prime p € Pn has at most 2n bits in its binary representation. We first show that the 
language 


L = {{A,p,j) I A is an SLP for a tree, n = |A|, p e 1 < j < 2n, 
the j**' bit of val(A)^ mod p is 1} 

belongs to the counting hierarchy. The rest of proof then follows the argument in [5]: Using 
the DLOGTIME-uniform TC°-circuit family from for transforming a number from its Chinese 
remainder representation into its binary representation one defines a TC°-circuit of size that 

has input gates x{p,j) (where n = |A|, p G P„, 1 < j < 2n). If we set x{p,j) to true iff (A,p, j) G L 
(this means that the input gates x{p,j) receive the Chinese remainder representation of val(A)^), 
then the circuit outputs correctly the (exponentially many) bits of the binary representation of 
val(A)^. Then, as in [S] proof of Thm. 4.1], one shows by induction on the depth of a gate that the 
problem whether a given gate of that circuit (the gate is specified by a bit string of length 0{n)) 
evaluates to true is in the counting hierarchy, where the level in the counting hierarchy depends 
on the level of the gate in the circuit^ 

Hence we have to show that L belongs to the counting hierarchy. Let A be an SLP for a tree 
t, n = |A|, p G Pm and 1 < j < 2n. By Theorem [TSl it suffices to consider the case that t is a 
caterpillar tree t; the polynomial time Turing reduction in Theorem [T5] increases the level in the 
counting hierarchy by one. Also note that we use a uniform version of Theorem 1151 where the 
interpretation (addition and multiplication in Zp) is part of the input. This is not a problem, since 
the prime number p has at most 2n bits, so all values that can appear only need 2n bits. 

Let m be the number of operators in t, i.e., the total number of occurrences of the symbols 
-|- and X in val(A). Note that m can be exponentially large in |A|, but its binary representation 
can be computed in polynomial time by Lemma 0] (point [5]) . We now define a matrix of numbers 
xlj G Zp {i,j G [1,TO -I- 1]) such that 


771+1 m+1 

i=i j=i 

Moreover, we will show that given A and binary encoded numbers i,j G [l,m -|- 1], the binary 
encoding of j (which consists of at most 2n bits) can be computed in polynomial time. 

We define the numbers ^ inductively over the structure of the caterpillar tree t. For the 
caterpillar tree t = a (with a G Zp) we set x\ i = a. Now assume that t = f{a, s) or t = f(s, a) 
for an operator / G {-E, x}, a caterpillar tree s with m — 1 operators, and a G Zp. In the case 

^Let us explain the differences to [S] proof of Thm. 4.1]: In [3], the arithmetic expression is given by a circuit 
instead of an SLP. This simplifies the proof, because if we replace in the above language L the SLP A by a circuit, 
then we can decide the language L in polynomial time (we only have to evaluate a circuit modulo a prime number 
with polynomially many bits). In our situation, we can only show that L belongs to a certain level of the counting 
hierarchy. But this suffices to prove the theorem, only the level in the counting hierarchy increases by the number 
of levels in which the set L sits. 
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t = f{s,a) we assume that m — 1 > 1; this avoids ambiguities in case t = f{a,b) for a,b G Xp. 
Assume that the numbers xf ^ are already defined for G If / = +, then we set: 

x\ I = a 

j = 1 for i G [2,to + 1] 

= 1 for i G [2,771+1] 

xl j = for i,j G [2,m+ 1] 

We get 

m+1 m+1 m+1 m+1 m m 

£ n =“+£ n n <7 =a+s^=t^. 

2—1 j—1 i—2 j—2 i—1 j—1 

If / = X, then we set: 

x\ i = 0 for i G [1, m + 1] 
xl^i = a for 7 G [2,777 + 1] 
a;‘_j = for i,j G [2,777 + 1] 

We get 

m+l m+l m+1 m+1 m m 

£ n = £ “ • n =«• £ n <7 =a-s^=t^. 

i—1 j—1 i—2 j—2 i—1 j—1 

We now show that the binary encodings of the numbers a;* ^ can be computed in polynomial 
time (given For this let us introduce some notations: For our caterpillar tree t = val(A) 

(which contains m occurrences of operators) and i G [ 1 , 777 ], j G [1, 777 + 1] we dehne inductively 
op(t,7) G {+, x} and operand(t, j) G Xp as follows: 

• If t = a G Xp, then let operand(t, 1) = a (note that in this case we have 777 = 0, hence the 
op(t,7) do not exist). 

• If t = /(a, s) or (t = f{s,a) and m — 1 > 1) with a G Xp, then we set op(t, 1) = /, 
op(t,7) = op(s,7 — 1) for i G [ 2 , 777 ], operand(t, 1) = a, and operand(t,j) = operand(s,j — 1) 
for j G [2, 777 + 1]. 

In other words: op(t,7) is the 7*^ operator in t, and operand(t,j) is the unique argument from 
Xp of the operator in t (recall that t is a caterpillar tree). The 777 **' (and hence last) operator 
in t has two arguments from Xp] its left argument is operand(t, 777 ) and its right argument is 
operand(t, 777 + 1). Using these notations, we can compute the numbers xlj by the following case 
distinction (correctness follows by a straightforward induction): 

• i < j: If op(t,7) = + then xlj = 1, else xlj = 0. 

• i = j'. If op(t,7) = + then xlj = operand(t, j), else xlj = 0. 

• i > j: If op(t, j) = + then xlj = 1, else xlj = operand(t, j). 

So, in order to compute the x* j it suffices to compute op(t, i) and operand(t, j), given A, i,j. This 

is possible in polynomial time: The position k of the operator in t and op(t, i) can be computed 
in polynomial time using point [3]ofLemma[4] (take F = {+, x}). Once the position k is computed, 
operand(t,7) can be computed in polynomial time using point (b) of Theorem 1101 

Recall that our goal is to compute a specific bit of val(A)^ mod p, where A is an SLP that 
produces a caterpillar tree, and p G [2, 2^"] is a prime, where n = jAj. We have to show that this 
problem belongs to the counting hierarchy. We have shown that 

m+1 m+1 

val(A)^ = XI n 

i=i i=i 


18 



where the binary encoding of the number x\ ^ S Zp can be computed in polynomial time, given 
We now follow again the arguments from [3]. It is known that the binary representation of 
a sum (resp., product) of n many n-bit numbers can be computed in DLOGTIME-uniform TC*^ |20] . 
The same holds for the problem of computing a sum (resp., product) of n many numbers from 
[0,p — 1] modulo a given prime number p with O(log n) bits (it is actually much easier to argue that 
the latter problem is in DLOGTIM E-uniform TG°, see again m)- Hence, there is a DLOGTIME- 
uniform TG° circuit family (C'm)m>i, where the input of Cm consists of bits x{i,j,k) {i,j € [l,m], 
k G O(log to)) and a prime number p with 0(log to) bits, such that the following holds: If x{i, j, k) 
receives the k*"^ bit of a number Xij € Zp, then the circuit outputs modp. We 

take the circuit Cm+i, where to € (recall that n = |A| and m is the number of operators in 

t = val(A)). The input gate x{i,j, k) receives the bit of the number x\ j G "Lp defined above. 
We have shown above that the bits of x\ ^ can be computed in polynomial time. This allows (again 
in the same way as in [S] proof of Thm. 4.1]) to show that for a given gate number of Cm+i one 
can compute the truth value of the corresponding gate within the counting hierarchy. □ 

Computing a certain bit of the output number of an arithmetic circuit belongs to PH [2] 

(but no matching lower bound is known). In our situation, the level gets even higher, so we made 
no effort to compute it. 

We can use the technique from the proof of Theorem [51] to show the following related result. 
Note that a circuit (or dag) over max and + can be evaluated in polynomial time (simply by 
computing bottom-up the value of each gate), and by the reduction from [18] the same holds for 
TSLP-compressed expressions. 


Theorem 22. The problem of evaluating SLP-compressed ({max, -|-} U Z)-trees over the integers 
belongs to the counting hierarchy. 


Proof. The proof follows the arguments from the proof of Theorem l211 But since the interpretation 
given by max and -L is polynomially bounded, every subtree of an SLP-compressed tree evaluates to 
an integer that needs only polynomially many bits with respect to the size of the SLP. Hence we do 
not need the Chinese remainder theorem as in the proof of Theorem and can use Theorem [15] 
directly. It remains to show that the problem of evaluating SLP-compressed ({max, -1-} U Z)- 
caterpillar trees belongs to the counting hierarchy. For this we follow the same strategy as in the 
proof of Theorem [2l1 and define numbers xC (where t = val(A) is the input caterpillar tree) such 
that 

m+1 


val(A)^ 


max > 

l< 2 <m+l 


Since the sum of n many n-bit numbers as well as the maximum of n many n-bit numbers can 
be computed in DLOGTIM E-uniform TG° (the maximum of n many n-bit numbers can be even 
computed in DLOGTIME-uniform AG°), one can argue as in the proof of Theorem [2T1 □ 


Let us now turn to lower bounds for the problems of evaluating SLP-compressed arithmetic 
expressions (max-plus or plus-times). For a number c S N consider the unary operation -Gc on 
N with -\-c{z) = z -\- c. The evaluation of SLP-compressed ({max, -Lc} U N)-trees is possible in 
polynomial time analogously to the proof of Theorem [16] The following theorem shows that the 
general case of SLP-compressed ({max, -|-} U N)-trees is more complicated. 

Theorem 23. Evaluating SLP-compressed ({max,-|-} UN)-trees is ffP-hard. 

Proof. Let A, B be two SLPs over {0,1} with |val(A)| = |val(B)|. We will reduce from the problem 
of counting the number of occurrences of (1,1) in the convolution val(A) ® val(B) G ({0,1}^)*, 
which is known to be #P-complete by [29j. Let p : {0,1}* ^ {max,-|-}* be the homomorphism 
defined by p(0) = max, p(l) = -L. One can compute in polynomial time from A and B an SLP for 
the tree /o(val(A)) 1 rev(val(B)). The corresponding tree over {max, -L, 0,1} evaluates to one plus 
the number of occurrences of (1,1) in the convolution val(A) ® val(B). □ 
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In [3] it was shown that the computation of a certain bit of the output value of an arithmetic 
circuit (over + and x) is ^P-hard. Since a circuit can be seen as a TSLP (where all nonterminals 
have rank 0), which can be transformed in polynomial time into an SLP for the same tree |10j . also 
the problem of computing a certain bit of val(A)^ for a given SLP A is #P-hard. For the related 
problem PosSLP of deciding, whether a given arithmetic circuit computes a positive number, no 
non-trivial lower bound is known. For SLPs, the corresponding problem becomes PP-hard: 

Theorem 24. The problem of deeiding whether val(A)^ > 0 for a given SLP A over {+, x} U Z 
is PP-hard. 

Proof. By [^, the following problem is PP-complete: Given SLPs A, B over {0,1} where |val(A)| = 
|val(B)|, and a binary encoded number z, is the number of occurrences of (1,1) in the convoluted 
string val(A) (g) val(B) at least z? We modify the proof of Theorem [23l Let A, B be SLPs over 
{0,1}, where N = |val(A)| = |val(B)|. Pick n > 0 such that 2"- > 2N. Let pA ■ {0,1}* ^ {+, x}* 
be the homomorphism defined by pa{0) = +, Pa( 1) = x and pB '. {0,1}* —{1,2}* be the 
homomorphism defined by pb(0) = 1, Pb( 1) = 2. One can compute in polynomial time from A 
and B an SLP for the tree pA(val(A)) (2") pB(rev(val(B))) (here 2" stands for an SLP that evaluates 
to 2"). Let P be the value of the corresponding tree. Note that i? is calculated by starting with 
the value 2" and applying JV additions or multiplications by I or 2. The number K of occurrences 
of (1,1) in the convolution val(A) g val(B) corresponds to the number of multiplications by 2 in 
the calculation, which can be computed from P: We have 

2n . 2^ < ^ < (2» + 2(N - K)) ■ 2^ < (2’" + 27V) • 2^ 

since P is maximal if (N — K) additions of 2 are followed by K multiplications by 2. Since 2N < 2*^ 
we obtain 2"’+^ < R < 2"+^ + r for some r < 2"’+^. Hence, K > z, ii and only if i? — 2"’+^ > 0. 
It is straightforward to compute an SLP which evaluates to i? — 2"+^. □ 

5.3.4 Tree automata 

(Bottom-up) tree automata (see [14] for details) can be seen as finite algebras: The domain of 
the algebra is the set of states, and the operations of the algebra correspond to the transitions of 
the automaton. This correspondence only holds for deterministic tree automata. On the other 
hand every nondeterministic tree automaton can be transformed into a deterministic one using a 
powerset construction. Formally, a nondeterministic (bottom-up) tree automaton A = {Q, P, A, F) 
consists of a finite set of states Q, a ranked alphabet F, a set A of transition rules of the form 
f{qi ,..., q-n) —t q where / € Fn and gi,..., qn, q G Q, and a set of final states F C Q. A tree 
t G T{F) is aceepted by .4 if t A a q for some q G F where —?>a is the rewriting relation defined 
by A as usual. The uniform membership problem for tree automata asks whether a given tree 
automaton A accepts a given tree t G T{F). In [55] it was shown that this problem is complete 
for the class LogCFL, which is the closure of the context-free languages under logspace reductions. 
LogCFL is contained in P and DSPACE(log^(n)). For every fixed tree automaton, the membership 
problem belongs to NC^ [28]. If the input tree is given by a TSLP, the uniform membership 
problem becomes P-complete [34]. For non-linear TSLPs (where a parameter may occur several 
times in a right-hand side) the uniform membership problem becomes PSPACE-complete, and 
PS PAC E-hardness holds already for a fixed tree automaton [32] ■ The same complexity bound 
holds for SLP-compressed trees (which in contrast to non-linear TSLPs only allow exponential 
compression): 

Theorem 25. Given a tree automaton A and an SLP A for a tree t G T{F), it is PSPACE- 
complete to decide whether A accepts t. Moreover, PSPACE-hardness already holds for a fixed tree 
automaton. 

Proof. For the upper bound we use the following lemma from [35] : If a function / : E* —^ F* is 
PS PACE-computable and L C F* belongs to NSPACE(log^(n)) for some constant k, then f~^{L) 
belongs to PSPACE. Given an SLP A for the tree t = val(A), one can compute the tree t by a 


20 


PSPACE-transducer by computing the symbol t[i] for every position i £ {1,..., |t|}. The current 
position can be stored in polynomial space and every query can be performed in polynomial time. 
As remarked above the uniform membership problem for explicitly given trees can be solved in 
DSPACE(log^(n)). 

For the lower bound we use a fixed regular language L C ({0,1}^)* from [29] such that the 
following problem is PS PACE-complete: Given two SLPs A and B over {0,1} with |val(A)| = 
|val(B)|, is val(A) (8)val(B) £ L? 

Let A = {Q, (0,1}^, A, qo, F) be a finite word automaton for L. Let A, B be two SLPs over 
(0,1} with |val(A)| = |val(B)| and let T be an SLP for the comb tree t{u, v) where u = rev(val(A)) 
and V = rev(val(B)). We transform A into a tree automaton At over {/o, /i, 0,1, $} with the state 
set Q l±l {po,pi}, the set of final states F and the following transitions: 

$ —>■ qo; 

i^Pi, fori £{0,1}, 

Mq,?]) -t for (q, {i,j),q') £ A 

The automaton A accepts the convolution val(A) 0 val(B) if and only if the tree automaton At 
accepts t(u, u). □ 

The PS PACE-hardness result in Theorem [25l can also be interpreted as follows: There exists 
a fixed finite algebra for which the evaluation problem for SLP-compressed trees is PS PACE- 
complete. This is a bit surprising if we compare the situation with dags or TSLP-compressed 
trees. For these, membership for tree automata is still doable in polynomial time |34j . whereas the 
evaluation problem of arithmetic expressions (in the sense of computing a certain bit of the output 
number) belongs to the counting hierarchy and is ^P-hard. In contrast, for SLP-compressed 
trees, the evaluation problem for finite algebras (i.e., tree automata) is harder than the evaluation 
problem for arithmetic expressions (PSPACE versus the counting hierarchy). 


6 Further research 

We conjecture that in practice, grammar-based tree compression based on SLPs leads to faster 
compression and better compression ratios compared to grammar-based tree compression based 
on TSLPs, and we plan to substantiate this conjecture with experiments on real tree data. The 
theoretical results from Section Uj indicate that SLPs may achieve better compression ratios than 
TSLPs. Moreover, grammar-based string compression can be implemented without pointer struc¬ 
tures, whereas all grammar-based tree compressors (that construct TSLPs) we are aware of work 
with pointer structures for trees, and a string-encoded tree (e.g. an XML document) must be 
first transformed into a pointer structure. Moreover, we believe that SLPs can be encoded more 
succinctly than TSLPs (for instance, we do not have to store the ranks of nonterminals). 
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